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Abstract: 

A DNA profile from the perpetrator does not reveal, per se, the circumstances by 
which it was transferred. Body fluid identification by mRNA profiling may allow 
extraction of contextual 'activity level' information from forensic samples. Here we 
describe the development of a prototype multiplex digital gene expression (DGE) method 
for forensic body fluid/tissue identification based upon solution hybridization of color- 
coded NanoString® probes to 23 mRNA targets. The method identifies peripheral blood, 
semen, saliva, vaginal secretions, menstrual blood and skin. We showed that a simple 5 
minute room temperature cellular lysis protocol gave equivalent results to standard RNA 
isolation from the same source material, greatly enhancing the ease-of-use of this method 
in forensic sample processing. 

We first describe a model for gene expression in a sample from a single body 
fluid and then extend that model to mixtures of body fluids. We then describe calculation 
of maximum likelihood estimates (MLEs) of body fluid quantities in a sample, and we 
describe the use of likelihood ratios to test for the presence of each body fluid in a 
sample. Known single source samples of blood, semen, vaginal secretions, menstrual 
blood and skin all demonstrated the expected tissue- specific gene expression for at least 
two of the chosen biomarkers. Saliva samples were more problematic, with their 
previously identified characteristic genes exhibiting poor specificity. Nonetheless the 
most specific saliva biomarker, HTN3, was expressed at a higher level in saliva than in 
any of the other tissues. 

Crucially, our algorithm produced zero false positives across this study's 89 
unique samples. As a preliminary indication of the ability of the method to discern 
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admixtures of body fluids, five mixtures were prepared. The identities of the component 
fluids were evident from the gene expression profiles of four of the five mixtures. 
Further optimization of the biomarker 'CodeSet' will be required before it can be used in 
casework, particularly with respect to increasing the signal-to-noise ratio of the saliva 
biomarkers. With suitable modifications, this simplified protocol with minimal hands on 
requirement should facilitate routine use of mRNA profiling in casework laboratories. 
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1. Introduction 

Genetic identification of the donor of transferred biological traces deposited at the 
crime scene or on a person using STR analysis is now routine practice worldwide [1]. 
This represents potentially crucial 'source level' information for investigators [2]. A 
DNA profile from the perpetrator does not however reveal the circumstances by which it 
was transferred. This contextual information (sometimes known as the 'activity level' in 
Cook and Evett's classic 1998 paper [2]) is important for casework investigations 
because the deposition of the perpetrator's biological material requires some behavioral 
activity that results in its transfer from the body. The consequences of different modes of 
transfer of the DNA profile may dramatically affect the investigation and prosecution of 
the crime. For example a DNA profile from a victim that originates from skin versus the 
same DNA profile that originates from vaginal secretions may support social or sexual 
contact respectively. Thus tissue/body fluid sourcing of the DNA profile should be an 
important concern for, and service from, forensic genetics practitioners who are integral 
to the investigative team. The problem is that, up until the recent past, it was not possible 
to definitively identify many of the important body fluids of interest (e.g. vaginal 
secretions, saliva, and menstrual blood). 

In order to overcome the limitations of currently used classical body fluid 
identification methods, the use of messenger RNA (mRNA) profiling, as described by 
Juusola & Ballantyne [3], was proposed to supplant conventional methods for body fluid 
identification. Terminally differentiated cells, whether they comprise blood monocytes or 
lymphocytes, ejaculated spermatozoa, epithelial cells lining the oral cavity or epidermal 
cells from the skin become such during a developmentally regulated program in which 
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certain genes are turned off (i.e. transcriptionally silent) and turned on (i.e. are actively 
transcribed and translated into protein) [4]. Thus, a pattern of gene expression is 
produced that is unique to each cell type in both the presence and the relative abundance 
of specific mRNAs [4]. The type and abundance of mRNAs, if determined, would then 
permit a definitive identification of the body fluid or tissue origin of forensic samples. 
This is the basis for mRNA profiling for body fluid identification. RNA profiling now 
offers the ability to identify all forensically relevant biological fluids using methods 
compatible with the current DNA analysis pipeline [5,6]. Despite the identification of 
numerous body fluid specific candidates there is some reluctance to utilize RNA profiling 
assays in the forensic community due to concerns over the perceived instability of RNA 
in biological samples. However, several studies have been conducted in order to assess 
the stability of RNA in dried forensic stains [7-10]. These have demonstrated that RNA 
of sufficient quantity and quality for analysis can be recovered from aged and 
environmentally compromised forensic samples [7-10]. The effective stability (i.e. 
'recoverability') of mRNA in aged and compromised samples is not dissimilar to that of 
DNA and provides support to the use of mRNA profiling assays in forensic casework 
(Ballantyne, unpublished observations). The recently published EDNAP collaborative 
exercises on mRNA profiling for body fluid identification further demonstrate a 
significant interest in mRNA profiling by the forensic community in Europe and around 
the world as well as the ease in which this technology can be implemented into forensic 
casework laboratories [11-15]. Collectively, these studies demonstrate an interest in the 
use of mRNA profiling in forensic casework and its suitability of use with forensic 
samples and therefore warrant continued evaluation and development. Other classes of 
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RNA also exist in the cell and one in particular, microRNA (miRNA), has been 
investigated for potential forensic use since the short size of the molecule (-21-25 bases) 
makes it an attractive option for analyzing degraded specimens [16-22]. The field of 
forensic miRNA profiling, although promising, is less mature in terms of there being an 
international consensus on the identity and specificity of the best body fluid specific 
miRNA targets. Other non-RNA methods for body fluid identification have been recently 
investigated including the use of epigenetic [23-29] and proteomic [30-32] biomarkers. 
Although exhibiting some promise, epigenetic markers have not been identified for all of 
the important common body fluids and tissues such as vaginal secretions and skin. 
Proteomic markers suffer from a lack of demonstrated reproducibility studies among 
different laboratories, and paucity of peer reviewed reports demonstrating their forensic 
validity. 

Gene expression differences are quantitative in nature meaning that a particular 
biomarker may be expressed in a particular cell type at low, intermediate or high levels. 
Even when it is not generally regarded as being expressed in a particular cell type it may 
exhibit basal level (or 'leaky') transcription with a few molecules present per cell. Thus 
far there have been three main methods developed for mRNA profiling of forensic 
samples: capillary electrophoresis (CE)-based analysis [5-7,33-36], quantitative RT-PCR 
(qRT-PCR) [7,37-39] and, more recently, high resolution melt (HRM) analysis [40]. Due 
to its facile multiplex capabilities and routine use in DNA profiling, CE-based analysis 
has been the platform of choice for casework mRNA assays [5,6]. However post-PCR 
CE peak heights/areas are, at best, semi-quantitative in nature with respect to biomarker 
expression levels. Similarly, HRM signal amplitude does not appear to correlate precisely 
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with RNA input [40]. Although qRT-PCR permits quantitation of biomarker targets, its 
low multiplex capability (typically 3-4 targets maximum compared to >20 for CE) 
appears to have limited its use. 

In contrast to the aforementioned, digital gene expression (DGE) methods 
precisely count the number of individual transcripts in a sample [41] which facilitates the 
use of advanced statistical methods to better evaluate and interpret the experimental data. 
This facility would be expected to be of significant benefit when analyzing body fluid 
mixtures that are commonly encountered in forensic analysis. Deep sequencing of the 
transcriptome using next generation sequencing (NGS) technologies is capable of directly 
identifying and quantifying (by counting) all mRNA transcripts in a sample, a DGE 
technique known as RNA sequencing (RNA-Seq) [42]. RNA-Seq has been spectacularly 
successful in advancing our knowledge of cell-type- specific gene expression including 
transcript quantification and elucidation of their sequence diversity [42]. Although NGS 
heralds a new era of forensic genomics, impediments to its routine implementation in 
body fluid RNA analysis include its high cost of reagents and time-consuming, complex 
analysis. In this work we sought an alternative DGE method to NGS that is simpler and 
requires minimal hands-on experimentation. Here we describe the development of a 
prototype multiplex DGE method for forensic body fluid identification based upon 
solution hybridization of color-coded NanoString® probes [43] to 23 tissue/body fluid 
specific and 10 housekeeping gene mRNA targets present in forensic type samples. 
Concomitantly, to facilitate routine use, we also devised a simple 5 minute room 
temperature cellular lysis protocol as an alternative to standard RNA isolation for 
forensic sample processing. 
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2. Methods 

2.1 Body fluid samples 

Body fluids were collected from volunteers using procedures approved by the 
University's Institutional Review Board. Informed written consent was obtained from 
each donor. Blood samples were collected by venipuncture into vacutainers (K3-EDTA 
preservative) and 50 jul aliquots were placed onto cotton cloth and dried at room 
temperature. Freshly ejaculated semen was provided in sealed plastic tubes and stored 
frozen. After thawing, the semen was absorbed onto sterile cotton swabs and allowed to 
dry. Buccal samples (saliva) were collected from donors using sterile swabs by swabbing 
the inside of the donor's mouth. Semen- free vaginal secretions and menstrual blood were 
collected using sterile cotton swabs. Admixed body fluid samples were created by 
combining Vi of a 50 ul stain or single cotton swab from each body fluid. Environmental 
samples were prepared by exposing body fluid samples to the outside ambient heat, light 
and humidity protected ('covered') or non-protected ('uncovered') from precipitation for 
varying lengths of time (Supplementary Table 1). Human skin total RNA was obtained 
from commercial sources: Stratagene/Agilent Technologies (Santa Clara, CA), Biochain® 
(Hayward, CA), Zenbio (Research Triangle Park, NC), and Zyagen (San Diego, CA). 
Human brain total RNA was obtained from a commercial source (Biochain®) (run as an 
internal positive control and not used in any data analysis). Cellular skin samples were 
collected by swabbing human skin or a touched object surface with a sterile water pre- 
moistened sterile swab. For all RNA isolations, Vi or a whole 50 ul stain or single cotton 
swab was used. All samples were stored at -20°C until needed, except for the total RNA 
samples which were stored at -47°C. 
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Suspected bio-particles from male shirt collar samples were collected as 
previously described [44]. Briefly, WF Gel-Film® x8 retention level (Gel-Pak®, 
Hayward, CA), was cut to a size appropriate for subsequent attachment to a glass 
microscope slide support (3" x 1" x 1mm, Fisher Scientific, Suwanee, GA). Using sterile 
tweezers, the back protective covering was removed to expose the adhesive back and the 
Gel-Film® was placed onto a clean glass microscope slide. The top protective plastic film 
layer was then removed using re- sterilized tweezers. The Gel-Film® surface was then 
repeatedly touched to the sample area (direct skin, clothing or object surface) several 
times to ensure sufficient transfer of biological material. Samples were stained with 
Trypan Blue (0.4%) (Sigma- Aldrich, St. Louis, MO) for 1 minute, then washed briefly by 
gentle flooding with sterile ultrapure water with a resistivity of 18.2MQ at 25°C. Samples 
were then air-dried at room temperature prior to proceeding to sample collection. All 
samples were stored at room temperature in microscope slide boxes protected from light. 
Bio-particles were viewed, imaged and collected using a Leica M205C stereomicroscope 
(Micro Optics of FL, Inc, Davie, FL). Twenty-five, fifty and one hundred bio-particles 
(i.e. single cells or 'cellular agglomerates') were collected. Bio-particles were collected 
from Gel-Film® surface using 3M™ water-soluble wave solder tape (5414 transparent) on 
the end of a tungsten needle. The 3M™ water-soluble adhesive was adhered to a clean 
glass microscope slide using double sided tape and collected on the end of a tungsten 
needle under the stereomicroscope. The collected bio-particles were then transferred into 
a sterile 0.2ml PCR flat-cap tube (Phenix Research, Candler, NC)) containing lysis 
buffer: 100 bio-particle shirt collar sample - 10 ul of lysis buffer solution: 2. IX buffer- 
blue, 10% forensicGEM™ reagent (ZyGEM forensicGEM™ tissue kit, VWR, Suwanne, 
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GA), sterile water; 25 and 50 bio-particle shirt collar samples - 5 ul of lysis buffer 
solution: IX buffer-silver, 5% RNAGEM™ reagent (ZyGEM RNAGEM™ tissue kit, 
VWR), sterile water. 

2.2 RNA Isolation 

Total RNA was extracted from blood, semen, saliva, vaginal secretions, menstrual 
blood and skin using a manual organic RNA extraction (guanidine isothiocyanate- 
phenol: chloroform) as previously described [33,45]. Briefly, 500 ul of pre-heated (56°C 
for 10 minutes) denaturing solution (4M guanidine isothiocyanate, 0.02M sodium citrate, 
0.5% sarkosyl, 0.1M P-mercaptoethanol) was added to a 1.5mL Safe Lock extraction 
tube (Eppendorf, Westbury, NY) containing the stain or swab. The samples were 
incubated at 56°C for 30 minutes. The swab or stain pieces were then placed into a DNA 
IQ™ spin basket (Promega, Madison, WI), re-inserted back into the original extraction 
tube, and centrifuged at 14,000 rpm (16,000 x g) for 5 minutes. After centrifugation, the 
basket with swab/stain pieces was discarded. To each extract the following was added: 50 
ul 2 M sodium acetate and 600 ul acid phenol: chloroform (5:1), pH 4.5 (Ambion by Life 
Technologies). The samples were then centrifuged for 20 minutes at 14,000 rpm (16,000 
x g). The RNA-containing top aqueous layer was transferred to a new 1.5ml 
microcentrifuge tube, to which 2 ul of GlycoBlue™ glycogen carrier (Ambion by Life 
Technologies) and 500 ul of isopropanol were added. RNA was precipitated for 1 hour at 
-20°C. The extracts were then centrifuged for 20 minutes at 14,000 rpm (16,000 x g). The 
supernatant was removed and the pellet was washed with 900 ul of 75% ethanol/25% 
DEPC-treated water. Following a centrifugation for 10 minutes at 14,000 rpm (16,000 x 

10 



Downloaded from http://biorxiv.org/on September 18, 2014 



g), the supernatant was removed and the pellet dried using vacuum centrifugation for 3 
minutes. Twenty microliters of pre-heated (60°C for 5 minutes) nuclease free water 
(Ambion by Life Technologies) was added to each sample followed by an incubation at 
60°C for 10 minutes. All extracts were DNase treated to remove residual DNA using the 
Turbo DNA-/ree™ kit (Applied Biosystems (AB) by Life Technologies, Carlsbad, CA) 
according to the manufacturer's protocol. With each extraction, a negative control 
(extraction reagents without sample) was included. 

Alternatively, total RNA was extracted from blood, semen, saliva, vaginal 
secretions, menstrual blood and skin using direct lysis without purification. One hundred 
microliters of Buffer RLT Plus (QIAGEN, Germantown, MD) with 1 ul p- 
mercaptoethanol was added to a 1.5mL Safe-Lock extraction tube (Eppendorf, Westbury, 
NY) containing the stain or swab. The samples were incubated at room temperature for 5 
minutes with constant vortexing (20 second intervals). The swab or stain pieces were then 
placed into a DNA IQ™ spin basket (Promega, Madison, WI), re-inserted back into the 
original extraction tube, and centrifuged at 14,000 rpm (16,000 x g) for 5 minutes. After 
centrifugation, the basket with swab/stain pieces was discarded. All samples were stored 
at -20°C until needed. 

Total RNA was extracted from bio-particles using the ZyGEM forensicGEM or 
RNAGEM™ tissue kits (VWR). For the forensicGEM™ kit, samples were lysed at 75°C 
for 15 minutes. For the ft/VAGEM™ kit, samples were lysed at 75°C for 5 minutes. All 
samples were stored at -20°C until needed. 
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2.3 RNA Quantitation 

RNA extracts (manual organic RNA extraction only) were quantitated with 
Quant-iT™ RiboGreen® RNA Kit (Invitrogen by Life Technologies, Carlsbad, CA) as 
previously described [33,45]. Fluorescence was determined using a Synergy™ 2 Multi- 
Mode microplate reader (BioTek® Instruments, Inc., Winooski, VT). 

2.4 NanoString® Technology 

NanoString® standard gene expression chemistry utilizes two -50 base probes, the 
reporter probe and the capture probe, for each mRNA target of interest [43]; when 
multiplexed, the probe pairs are referred to as a CodeSet. A multiplex CodeSet can be 
designed to have probe pairs targeting between 20 and 800 mRNAs. Each 
capture/reporter probe pair within the CodeSet is specifically designed to hybridize to an 
individual mRNA target. The reporter probe carries the signal and is comprised of a 
unique molecular fluorescent barcode binding to the 5' end of the mRNA target. The 
capture probe binds to the 3' end of the mRNA target and adheres the capture 
probe/barcode/target complex to the cartridge surface for data collection (see Figure 1). 
After overnight hybridization at 65 °C in a thermal cycler (typical time of 12-24 hours), 
the complex is purified on the nCounter Prep Station with excess, unbound probes 
removed and intact complexes bound, stretched and immobilized on an nCounter 
Cartridge. Sample cartridges are then placed onto the nCounter Digital Analyzer for 
counting and data collection of each target complex. The number of times each barcode is 
counted is proportional to the abundance of that mRNA target in a given sample. 

In this study, a NanoString® multiplex custom CodeSet was designed and created 
to target 23 genes known to be differentially expressed in forensically relevant body 
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fluids and tissues. As a reference, 10 ubiquitously expressed housekeeping genes were 
also included in the CodeSet, giving a 33-plex total. The body fluids and tissues targeted 
include: venous blood, menstrual blood, semen, saliva, vaginal secretions, and skin. The 
multiplex CodeSet consisted of 3 venous blood genes (ALAS2, ANK1, HBB) [11,13,34], 
2 menstrual blood genes (LEFTY2, MMP10) [15,34,36], 3 saliva genes (HTN3, MUC7, 
STATH) [3,14,34], 3 semen genes (PRM2, SEMG1, TGM4) [14,34], 5 skin genes 
(CCL27, IL1F7, KRT9, LCE1C, LCE2D) [33,46], 7 vaginal secretion genes (CYP2A7, 
CYP2B7P1, DKK4, FUT6, IL19, MYOZ1, NOXOl) [45] and 10 reference (i.e. 
housekeeping) genes (B2M, COX1, HPRT1, PGK1, PPIH, S15, TCEA1, TFRC, UBC, 
UBE2D2) (Table 2). The CodeSet also included 6 positive control probes and 8 negative 
control probes. The 6 positive control probes are designed to assess overall assay 
performance and to normalize the data, accounting for any assay variability within the 
system. The 8 negative control probes have no corresponding targets within the sample 
and assess background noise in the system. 

A total of 96 assays were included this study, involving 89 samples with technical 
replicates for 7 of the samples. A detailed summary of the 89 samples is provided in 
Table 1 and includes 14 blood, 17 semen, 17 saliva, 10 vaginal secretions, 10 menstrual 
blood, and 14 skin samples as well as 5 mixtures and 2 RNA-free controls. For each body 
fluid both standard and challenging or environmentally compromised samples were 
evaluated. Full sample descriptions, including number of donors, and the input (ng of 
total RNA or volume (ul) of extract) used for each of the 96 samples is provided in 
Supplementary Table 1 . 
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Hybridization assays were performed according to the standard NanoString® gene 
expression assay protocol, as follows: Each individual assay consisted of lOuL Reporter 
Probe, lOuL Hybridization Buffer, 5uL Capture Probe and the specified RNA sample 
input (in most cases, 50ng of total RNA or 5uL crude lysate) for a total reaction volume 
of 30uL. Assays were placed into a thermal cycler at 65°C with a 70°C lid, and allowed 
to hybridize overnight for approximately 16 hours. Following this, assays were placed 
onto the nCounter Prep Station using the high-sensitivity protocol for purification and 
immobilization of the hybridized targets on the imaging cartridge. The cartridges were 
then scanned on the nCounter Digital Analyzer for counting of the hybridized targets, and 
data files were exported for analysis. 

2.5 Statistical Methods 
2.5.1 Overview of method 

Our approach to the problem is motivated by three properties of bona fide 
casework samples: they often (i) comprise mixtures of two or more fluids, (ii) are limited 
in quantity and (iii) could be either partially or highly degraded. Our basic approach is as 
follows: First, we model the probability distribution of gene expression in body fluid 
samples. Next, we use this model to calculate the Maximum Likelihood Estimate (MLE) 
for the levels of each body fluid in a sample and to calculate the log-likelihood of a 
sample's profile given the estimated levels of each fluid. We then construct a likelihood 
ratio comparing the likelihood of a given sample's profile with and without the presence 
of a given fluid. If a sample's profile is far more likely when we include a specific fluid 
in the model, then we conclude the fluid is present in the sample. 
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2.5.1 Modeling gene expression in body fluids 

Gene expression is best modeled on the log (multiplicative) scale: a doubling of a 
gene's expression level is generally considered a change comparable in magnitude to a 
halving of its expression level, and a gene increasing from 200 to 400 mRNA transcripts 
is as meaningful a difference in gene expression as a gene increasing from 2000 to 4000 
counts. However, the mathematics of mixtures is additive: if a sample is half blood and 
half saliva, a gene's cumulative expression level will result from the summation of its 
expression levels in each tissue. We therefore model the contributions of each fluid to a 
mixture on the linear scale, but we measure discrepancies between observed and 
predicted expression on the log scale. 

We develop the algorithm as follows: As a conceptual starting point, we first 
describe a model for gene expression in a sample from a single fluid. We then extend this 
model to mixtures of fluids. From there we describe calculation of maximum likelihood 
estimates (MLEs) of fluid quantities in a sample, and we describe the use of likelihood 
ratios to test for the presence of a fluid in a sample. 

2.5.2 Model for gene expression in a sample from a single body fluid 

On average, each gene represents a given proportion of total gene expression in 
each fluid. For example, in the average blood sample we might expect 15% of total RNA 
to be HBB, 1% to be ALAS2, etc. Call these expected proportions X H bb, X A las2, etc. 
Then in a given blood sample, the vector of expected gene expression is (3(X H bb, X A las2, 
. . .) T , where [3 is the total amount of RNA in the sample. 
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Due to both biological and technical noise, actual expression will vary around its 
expectation. Per the multiplicative nature of gene expression, we model this variability as 
arising from a log-normal distribution, and we assume that each gene is equally variable. 
A single gene's expression in a sample can then be modeled: 

log(y H B B ) ~ N(log(X HB B P),a 2 ), 
where ynBB is the expression of HBB in the sample, and a is the variance (on the log 
scale) of HBB' s expression around its expectation. 

2.5.3 Model for gene expression in mixtures of body fluids 

The model for mixtures follows naturally from the model for single-fluid 
samples. First, let us define notation. We represent matrices with bold, uppercase 
letters, vectors with bold, lowercase letters, and scalars with lowercase letters. We 
index samples ie (1, n), genes j e (1, p), and tissues k e (1, K). Call the gene 
expression profile for a given sample yi = [yn, yi P ) T , where yij is the expression of 
gene j in sample i. Call pV the amount of fluid k in sample i, and call Pi = (p\i, p\k) 
the vector of the amounts of all the fluids in sample i. Finally, define the matrix X to 
represent the expected proportion of each gene in each fluid type, with Xjk, the 
element in the j th row and the k th column of X, representing the expected proportion 
of gene j in samples from fluid k. 

Assuming the number of mRNA molecules in mixtures of fluids will be a sum 
of the number of mRNA molecules in each component of the mixture, we can write 
the expected counts of gene j in sample i: 
E(yij) = Zfc=i PikXjk, 
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and the expression for the sample's entire expected gene expression vector is simply 
E(yO=xp, 

Again assuming the variability of gene expression occurs on the log scale, we model 
gene expression in a sample as: 

log(yO ~ N(log(XP0,o-2I), 
where I is the identity matrix and a 2 is the common variance (on the log scale) of all 
genes. (Note that if E(yO = XPi, then E(log(yO) * log(XPi). However, under the values 
considered in this application, E(log(yi)) very closely approximates log(XPi).) As we 
lack the data to fully estimate the genes' covariance matrix, we approximate it with 
a 2 l. 

Before we can apply the above model for gene expression in body fluids, we 
must estimate two parameters: X, the matrix of expected proportions of gene 
expression, and a 2 , the variance of gene expression. Estimation of the X matrix is 
described in Section 3.2. We estimated a 2 , the variance on the log scale common to 
all genes, as the average variance of each gene in each tissue or fluid. 

2.5.4 Maximum likelihood estimation of the amounts of each tissue or fluid in a 
sample 

Under the assumptions that log gene expression is normally distributed 
around the log of its expectation and that each gene is equally variable, the MLE for 
Pi can be calculated as follows: 

Pi = argminp ||log( yi ) - log(XP)||| s.t. p > 0, 
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i.e., p £ minimizes the sum of squared errors on the log scale between the observed 
gene expression yi and the predicted gene expression xp, subject to the constraint 
that all the elements of p are non-negative (a sample cannot have negative amounts 
of a fluid). As it is doubtful that a closed-form solution to this expression exists, we 
use numerical methods to optimize it [47]. The expression is not convex in P; 
however, we find its estimates to be reasonably robust to differing initial conditions, 
returning similar estimates with very similar log-likelihoods. 

To prevent the algorithm from overexerting itself trying to fit gene 
expression values in the background of the assay, we found it necessary to add one 
layer of complexity to the model: in addition to fitting (B terms for each fluid, we 
added a (3 for background, with a corresponding column in the X matrix with equal 
weights on all genes. We further constrained this background (3 term to contribute 
no more than 15 counts to each gene. For the same reason, we truncated all gene 
expression values at 5 counts, a reasonable estimate of the average background 
counts. 

2.5.5 Using likelihood ratios to test the presence of tissues or fluids 

In any given sample yj, our goal is to determine which tissues or fluids are 
present. That is, we want to test whether each element of Pi equals 0. A reasonable 
approach to this problem is to calculate the likelihood of the data under the MLE p £ and 
under a constrained MLE p £ _ ; -with the Pij term corresponding to the tissue or fluid in 
question forced to 0. The likelihood ratio under the full and constrained MLEs will 
summarize the evidence for the presence of the tissue or fluid in question. 
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Calculation of a log likelihood for the data given a MLE is straightforward. 
Under our model, log gene expression is normally distributed around the log of the 
predicted gene expression. Then up to a constant, the log-likelihood of y; given p^is: 

loglik(y t \$i) = 

- 1 -log{det{o 2 \))-\{\og{y i ) - logiX^)) 1 V 2 l(log(y t ) - log(X%))- 
To test whether fluid j is present in sample i, we evaluate the above expression using yi 
and p £ and again using y; and the constrained MLE Pi,_/, and we calculate a likelihood 
ratio. 
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3. Results 

3.1 Selection of mRNA biomarkers 

We designed a 'CodeSet' to probe 23 body fluid/tissue specific genes and 10 
housekeeping genes (Table 2), which is well within the 800 target technological 
capability of the system. To take advantage of the high multiplex capability of the 
system, we deliberately included biomarkers that have been demonstrated to be highly 
specific to a particular body fluid (e.g. PRM2 and SEMG1 for semen) as well as some 
that have shown a lesser degree of tissue specificity (e.g. MYOZ1 for vaginal secretions 
and MUC7 for saliva). 

3.2 Estimating expected body fluid profiles 

Our algorithm requires accurate estimates of each fluid's average gene expression 
profile; below, we describe the results of this analysis. 

Our dataset included samples of highly varying RNA concentration, and genes in 
the lower-concentration samples frequently dropped into the background noise of the 
assay. To ensure accurate estimates of each body fluid's average gene expression profile, 
we only used samples with high expression levels of housekeeping genes. As a set of 
'training samples' we took the four highest-expressing samples from each fluid type with 
the exception of saliva, where a lack of high-expressing samples limited us to three 
training samples. Supplemental Figure 1 shows the overall housekeeping gene expression 
levels in the training samples and the remaining samples. 

Per our model described in Section 2.5.3, we are interested in the relative 
expression levels of the genes within each body fluid; that is, in the proportion of total 
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signature gene expression expected from each gene in a given body fluid. (This is in 
contrast to most gene expression-based classifiers, which are more interested in each 
gene's absolute expression level. Since it is unrealistic to expect a housekeeping gene to 
be invariant across body fluid types, normalizing our data to attain "absolute" expression 
levels is impossible.) Therefore, we globally normalized each sample, rescaling them so 
the sum of all expression values was 1 and so that each gene's expression value was its 
proportion of the total signature gene expression. We then estimated each gene's 
expected proportion of expression in each fluid with its mean normalized expression 
value within each fluid. 

The five body fluids and skin demonstrated highly distinct gene expression 
profiles, and although the signature genes varied between samples of the same fluid, their 
differences between fluids were much greater. 

Figure 2 shows the expected proportion of total expression for each gene in each 
fluid. Supplemental Figure 2 shows the consistency of these profiles in the training data, 
and Supplemental Figure 3 organizes the information in Figure 2 by gene rather than by 
fluid. In all fluids the average expression profile exhibits elevated expression of the 
fluid's putative characteristic genes, although this trend was distinctly weaker in saliva 
samples. 

HBB expression dominated the blood profiles, far exceeding the other blood 
markers ALAS2 and ANK1, although ALAS2 levels in blood greatly exceeded those of 
other genes. The putative blood marker ANK1 was not enriched in blood samples, 
surprisingly appearing most prominently in saliva samples instead. Expression in semen 
samples came almost entirely from the semen-specific genes PRM2, TGM4 and SEMG1, 
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although other genes, particularly HBB, were detectable. Saliva samples had the most 
diffuse profile, with the saliva- specific genes STATH, MUC7 and HTN3 contributing 
only 28% of total measured expression. Vaginal secretion samples had highly elevated 
levels of the vaginal markers DKK4, CYP2B7P1 and to a lesser extent FUT6. Menstrual 
blood samples alone showed elevated expression of their characteristic genes MMP10 
and LEFTY2. Unsurprisingly, menstrual blood samples also contained blood (HBB, 
ALAS2) and vaginal secretion (CYP2B7P1) biomarkers. Skin samples showed elevated 
expression of the skin genes LCE1C, IL1F7 and CCL27, although these genes were also 
slightly elevated in vaginal secretions and menstrual blood. HBB was the most prevalent 
gene in the commercial skin preparation, probably due to the inevitable presence of 
contaminating endothelial tissue in such preparations. 

Most genes were present at a non-negligible proportion of total expression in the 
saliva samples. This phenomenon results from this study's lack of a good saliva marker. 
If a gene highly expressed in saliva were measured, the relative expression of the other 
fluids' characteristic genes in saliva would shrink dramatically. 

3.3 Using gene expression to predict the body fluid composition of samples 

Our algorithm for body fluid detection is described in detail in the Methods 
section. Below, we summarize the performance of the method in predicting the body 
fluid composition of every sample in our study. Crucially for forensic applications, our 
test appears to have extremely high specificity; in fact, it returned zero false positives in 
this study's 89 samples. 
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We used a likelihood ratio cutoff of 100 to declare whether a body fluid was 
detected in a given sample, and found that 53/80 single-fluid, non-duplicate samples 
(66%) gave positive results. It is worth noting that our collection of samples was not 
necessarily representative of the real world population of forensic samples, as in many 
cases we intentionally chose degraded and miniscule samples to push the limits of the 
assay. Figure 3 shows the rate at which each body fluid was declared 'detected' in each 
actual fluid using an LR of 100. Supplemental Figures 4 and 5 indicate the performance 
of the algorithm in the training samples (abundant RNA) and in the remaining samples 
(low RNA quantity) respectively. The algorithm was successful in identifying the correct 
body fluid as long as the sample was abundant enough; in low input samples it detected 
blood, semen and vaginal secretions reliably while struggling to detect saliva, menstrual 
blood and skin. Across all samples, detection of blood, semen and vaginal secretions was 
nearly perfect. Menstrual blood was successfully detected 60% of the time. Blood and 
vaginal secretions were frequently detected in menstrual blood, though these cannot be 
considered false positives. Rather, it appears menstrual blood is best modeled as a 
variable mixture of blood, menstrual blood, and vaginal secretions. Saliva was 
successfully detected in only 25% of samples, likely due to fact that the characteristic 
saliva genes were not as informative as other fluids' characteristic signature genes and/or 
to the very low level of total RNA in most of the saliva samples. Skin also proved 
difficult to detect (31% success rate); however, the need to identify skin will probably be 
limited to specialized forensic cases. It is much more important to ensure that skin 
samples are not misclassified as other tissues. 
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The choice of a LR >100 cutoff for detecting fluids is arbitrary, and our algorithm 
could achieve better performance with a less strict cutoff. Figure 4 shows ROC curves 
for the True Positive Rate (TPR) and False Positive Rate (FPR) for detection of each 
fluid type in our data. As the LR threshold relaxes our test returns more of both false 
positives and false negatives. For the tissues with the worst performance in our data - 
menstrual blood, saliva and skin - the ROC curves reveal that a relaxation of the LR 
thresholds in some tissues would result in large increases in TPR without any increase in 
FPR. 

3.4 Body fluid mixtures 

As a preliminary indication of the ability of the method to discern admixtures of 
body fluids, five mixtures were prepared by combining Vi of a 50 jul stain or single cotton 
swab from each body fluid. The mixtures comprised four binary (2 x vaginal 
secretions/semen, 2 x blood/saliva) and one ternary mixture (semen/saliva/vaginal 
secretions). The blood/saliva and vaginal secretions/semen were biological, as opposed to 
technical, replicates since the donors were different. Using an LR of 100 as a decision 
threshold, two of the five mixtures were called perfectly, namely one of the vaginal 
secretions/semen and one of the blood/saliva samples (Figure 5). One of the component 
fluids was identified in each of the three 'false negative' mixtures: vaginal secretions 
(vaginal secretions/semen and semen/saliva/vaginal secretions) and saliva (blood/saliva). 
In the latter ternary mix the semen and saliva components were detected but with LRs of 
<100 (36.9 and 3.4 respectively). In the second blood-saliva sample, the LR for saliva 
was 95, falling just short of our strict bar for detection. In all but one of the mixture 
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samples, the component fluids are evident from their likelihood ratio profiles: using an 
LR cutoff of 5, four of the five mixtures were called perfectly. Significantly, no false 
positives were observed even under the very generous LR cutoff of 5. 

3.5 Development of a routine-use 5 minute RNA direct lysis method 

To facilitate routine analysis, we tested a simple 5 minute room temperature 
cellular lysis protocol as an alternative to standard RNA isolation for forensic sample 
processing using the NanoString® procedure (See Methods Section). The method is based 
upon the RLT buffer from QIAGEN which contains a high concentration of guanidine 
thiocyanate as well as a proprietary mix of detergents. P-mercaptoethanol (1% v/v) is also 
added before use to inactivate RNAses in the lysate. The NanoString assay involves 
direct hybridization to the RNA with no enzymatic steps, and neither the presence of the 
denaturing buffer nor the cellular debris in the lysate have a significant impact on the 
assay results. 

We compared the reproducibility of the assay between standard RNA 
isolation/purification and direct lysis protocols from the same source material. Fourteen 
of the samples in our study were compared in this manner. Supplemental Figure 6 shows 
scatterplots comparing log expression values for each of these same source samples 
between the two protocols. In general we saw excellent concordance between the two 
protocols for all genes with a moderate to high degree of expression. The correlation 
between the protocols breaks down for very lowly-expressed genes, reflecting the greater 
noise in the assay when measuring vanishing target. The most dramatic differences 
between replicates (for example in the samples MB-2 and BD-5) are attributable to 
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expected variance in RNA input amounts between lysate and purified RNA since lysate 
concentration is not reliably measureable by current methods. The concordance observed 
between lysis and purified protocols suggest that the simpler, 5 minute lysis protocol 
would be an efficient option for routine forensic casework workflow. 
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4. Discussion 

The results of this preliminary proof of principle study indicate that it is feasible 
to identify the common forensically relevant body fluids by multiplex solution 
hybridization of barcode probes to specific mRNA targets using a simple five minute 
direct lysis protocol. This simplified protocol with minimal hands-on requirement should 
facilitate routine use of mRNA profiling in casework laboratories. We first describe a 
model for gene expression in a sample from a single body fluid and then extend that 
model to mixtures of body fluids. From there we describe calculation of maximum 
likelihood estimates (MLEs) of body fluid quantities in a sample, and we describe the use 
of likelihood ratios (LR) to test for the presence of each body fluid in a sample. In 
contrast to most gene expression-based classifiers, we do not train a machine learning 
algorithm to optimize our ability to call samples correctly; rather, we define a 
biologically reasonable model of gene expression in body fluid samples and we use that 
model to evaluate the strength of evidence a sample provides for the presence of a 
particular fluid. This founding of our algorithm in sound statistical principles allows the 
calculation of log-likelihoods for detection of each fluid type, making the algorithm's 
results defensible in courtroom settings. 

A further benefit of this principled approach is that it allows us to evaluate our 
algorithm on all samples, including those used in training: as our algorithm is based on an 
a priori model of gene expression in body fluid mixtures, and as we estimated its 
parameters without regard to model performance, the algorithm can only minimally 
overfit the training data. Our algorithm's performance in the training samples may 
therefore slightly overestimate its performance in future samples, while its performance 
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in the other, low-RNA samples will considerably underestimate future performance in 
high-quality samples. Although we initially used an LR of 100 as the decision threshold 
for all body fluid types, we subsequently demonstrated that it may be possible to use a 
less restrictive threshold to improve the positive call rate without generating false 
positives. Alternative approaches using body fluid- specific thresholds should be 
investigated. 

While the prototype biomarker 'CodeSet' performed remarkably well in the work 
described herein, further optimization of the biomarkers will be required before the 
method can be used in casework. The HBB blood biomarker is approximately 1000-fold 
more highly expressed than ALAS2, the second-most prevalent blood marker in our data. 
This means that HBB's limit of detection (LOD) is so low that the possibility of false 
positives with non-blood body fluids increases due to possible low level contamination 
with vascular tissue products. This potentially confounding issue can be addressed by 
attenuating the HBB signal with the addition of precisely defined quantities of 
specifically designed unlabeled oligonucleotides complementary to the HBB RNA prior 
to hybridization with the full CodeSet. These competitively inhibit the hybridization 
reaction with the labeled probes. 

In contrast to the need to attenuate one of the blood biomarkers, the signal for the 
saliva biomarkers needs to be enhanced. The most specific and highly expressed saliva 
biomarker is HTN3. Signal intensification could be accomplished by designing multiple 
probes that bind along a single HTN3 mRNA. In addition the current probes could be 
designed to hybridize to both HTN3 and HTN1, the latter of which is also saliva specific. 
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Alternative novel biomarkers identified by RNA-Seq studies could also be employed if 
the HTN3 intensification strategies fall short of expectations. 

Some of the selected biomarkers did not perform as expected. For example, the 
ANK1 blood biomarker did not demonstrate blood specificity in the NanoString® assay 
with this sample set since the expression level was low in all tissues. Re-design of some 
probe sequences may be worthwhile, but it is likely that assay performance would be 
most significantly improved by the incorporation of additional body fluid specific 
biomarkers (e.g. commensal bacteria from the vagina, such as Lactobacillus sp.). Future 
iterations of the CodeSet will evaluate the performance of additional genes. 

As a preliminary indication of the ability of the method to discern admixtures of 
body fluids, one ternary and four binary mixtures were prepared. The true fluid 
composition in four of the five mixtures was clear from their likelihood ratio profiles, and 
at least one fluid was correctly detected in all mixtures. Although these results were 
encouraging, a thorough investigation of the performance of a more optimized 
NanoString® assay with a variety of different mixtures will be necessary. 

There needs to be a note of caution with respect to the skin assay results. The 
chosen skin biomarkers were selected using total skin RNA from commercial sources due 
to the difficulties in isolating sufficient quantities of total RNA from touch samples to 
perform the hundreds of assays required for the biomarker screening and confirmation 
process. It is likely that the highly purified commercial skin samples will contain mRNAs 
that originate from multiple layers of skin including both dermal and epidermal tissue as 
well as contaminating endothelial tissue and its contents (i.e. blood), and it is likely that 
bona fide touch samples, which presumably mainly consist of cortical cells from the 
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epidermis, will possess a different gene expression profile than that obtained from the 
commercial product. Some of the putative skin biomarkers were found in some of the 
other tissues, especially saliva (CCL27, LCE2D, IL1F7, KRT9), a finding perhaps due to 
common biomarker functions in skin and the alimentary tract or to the presence of skin 
cells in saliva. The highly expressed blood marker HBB was present in the commercial 
skin RNA preparations at comparable or higher levels than the highly expressed skin 
biomarker LCE1C, confirming the presence of contaminating endothelial tissue. In light 
of the extremely low abundance of tissue in most touch skin samples, it remains to be 
seen the degree to which skin biomarkers prove generally useful in forensic 
investigations. We suspect the inclusion of skin-specific genes will at a minimum help 
forensic assays avoid misclassification of skin samples as other tissues. 

Housekeeping genes are typically added to gene expression assays to indicate that 
RNA of sufficient quality and quantity for analysis is present, and for normalization 
purposes [6,15,38]. Due to non-uniform expression of housekeeping genes their value as 
normalizers is questionable [48,49]. Here we show that the developed algorithm does not 
require normalization with housekeeping genes. However their presence indicates the 
recovery of suitable RNA for analysis and therefore still has a certain utility in the assay. 
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Table 1 List of Samples Tested 



Sample Type 


N 


Description 


Blood 


14 




Organic Extraction 


1 


Blood stain on cotton cloth (-47°C storage after drying) 




1 


Environmental (outside (FL) - heat, sunlight, humidity, rain ( 1 month) 




1 


Environmental (outside (FL) - heat, sunlight, humidity, covered (3 days) 


Direct Lysis (RLT) 


5 


Blood stain on cotton cloth (-47°C storage after drying) 


Semen 


17 




Organic Extraction 


7 


Dried on cotton swabs (-47°C storage after drying) 




2 


Environmental (outside (bL) - heat, sunlight, humidity, covered (1 week) 




3 


Sensitivity: 25ng, 12.5ng, 6.25ng (input achieved by use of 5ul of extract) 


Direct Lysis (RLT) 


5 


Dried on cotton swabs (-47 C storage after drying) 


Saliva 


17 




Organic Extraction 


1 


Dried buccal sample on cotton swabs (-47°C storage after drying) 




1 


Environmental (outside (FL) - heat, sunlight, humidity, rain ( 1 week) 




1 


Environmental (outside (FL) - heat, sunlight, humidity, covered ( 1 month) 




3 


Sensitivity: 25ng, 12.5ng, 6.25ng (input achieved by use of 5ial of extract) 


Direct Lysis (RLT) 


5 


Dried buccal sample on cotton swabs (-47°C storage after drying) 


Vaginal Secretions 


10 




Organic Extraction 


6 


Dried sample on cotton swabs (-47°C storage after drying) 




1 


Environmental (outside (FL) - heat, sunlight, humidity, rain (3 days) 


Direct Lysis (RLT) 


3 


Dried sample on cotton swabs (-47°C storage after drying) 


Menstrual Blood 


10 




Organic Extraction 


7 


Dried sample on cotton swabs (-47°C storage after drying) 


Direct Lysis (RLT) 


3 


Dried sample on cotton swabs (-47°C storage after drying) 


Skin 


14 




Organic Extraction 


1 


Swab of surface skin (male hand); swab moistened with sterile water 




1 


Swab of coffee cup surface; swab moistened with sterile water 




1 


Swab of computer mouse; swab moistened with sterile water 


Direct Lysis (RLT) 


1 


Swab of surface skin (male hand); swab moistened with sterile water 




1 


Swab of coffee cup surface; swab moistened with sterile water 




1 


Swab of computer mouse; swab moistened with sterile water 


Direct Lysis (RNAGEM) 


1 


25 bio-particles (clumps); shirt collar (male) 




1 


50 bio-particles (clumps); shirt collar (male) 


Direct Lysis (forensicGEM) 


1 


100 bio-particles (55 clumps/45 singles); shirt collar (male) 


None 


5 


Skin total RNA (commercial source) 


Mixtures 


5 




Organic Extraction 


2 


Vaginal/semen (1/2 swab of each donor extracted in same tube) 




2 


Blood/saliva (1/2 stain/swab of each donor extracted in same tube) 




1 


Semen/saliva/vaginal (1/2 swab of each donor extracted in same tube) 


Controls 


3 




Organic Extraction 


2 


Clean sterile swab (negative control) 


None 


1 


Brain total RNA* (commercial source) 



Stain = 50 |ul stain; Swab - saturated body fluid swab (sterile cotton) 
Environmental samples (blood, semen, saliva) - on cotton cloth 
Total RNA - commercial sources (see methods) 
* run as an internal positive control and not used in any data analysis 
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Table 2. Body Fluid Specific and Housekeeping Genes in the NanoString® 
Custom CodeSet 



Gene 


Body Fluid Target 


ALAS2 


Blood 


ANK1 


Blood 


HBB 


Blood 


LEFTY2 


Menstrual Blood 


MMP10 


Menstrual Blood 


HTN3 


Saliva 


MUC7 


Saliva 


STATH 


Saliva 


PRM2 


Semen 


SEMG1 


Semen 


TGM4 


Semen 


CCL27 


skin 


IL1F7 


skin 


KRT9 


skin 


LCE1C 


skin 


LCE2D 


skin 


CYP2A7 


vaginal 


CYP2B7P1 


vaginal 


DKK4 


vaginal 


FUT6 


vaginal 


IL19 


vaginal 


MYOZ1 


vaginal 


NOXOl 


vaginal 


B2M 


Housekeeping Gene 


COX1 


Housekeeping Gene 


HPRT1 


Housekeeping Gene 


PGK1 


Housekeeping Gene 


PPIH 


Housekeeping Gene 


S15 


Housekeeping Gene 


TCEA1 


Housekeeping Gene 


TFRC 


Housekeeping Gene 


UBC 


Housekeeping Gene 


UBE2D2 


Housekeeping Gene 



Downloaded from http://biorxiv.org/on September 18, 2014 



Figure Legend 

Figure 1. NanoString® digital gene expression technology 

Figure 2. Average proportion of total expression for each gene in each fluid. 

The vertical axis shows the relative proportion of total gene expression attributable 
to each gene (on the log scale). For each fluid, each point shows a gene's relative 
expression in a single training sample, and each bar shows the average of the gene's 
relative expression across the fluid's training samples. Bar color indicates genes' 
putative tissues. 

Figure 3. Performance of the algorithm on all single-source samples. Bars 
display the rate at which each fluid is called detected in each sample type. Fluids are 
called detected if their likelihood ratio exceeds 100. 

Figure 4. ROC curves showing the algorithm's True Positive Rate (TPR) and 
False Positive Rate (FPR) for each tissue. Points indicate the performance 
achieved using a LR cutoff of 100. Relaxing this LR cutoff for detection of menstrual 
blood, saliva and skin could greatly increase the TPR without increasing the FPR. 
Line color indicates body fluid: blood - red, semen - blue, saliva - green, vaginal - 
orange, menstrual blood - pink, skin - purple. 

Figure 5. Performance of the algorithm in five mixture samples. For each of five 
mixture samples, a bar plot shows the likelihood ratios for the presence of each fluid 
type. The dotted line indicates a LR of 100. 
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Figure l.http://www.nanostring. com/applications/technology 
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Figure 2. 



Blood 



Seme-n 



D.D3Q -i 

0.D25 ■ 

0.D2Q ■ 

D D15 - 
D DID 
0.DO5 

c ;oc 



v s ? a I o ? g •; ;> =.. ? » r -sf 11 



3.: ■ 
M - 
j- ■ 
0 3 ■ 
M - 
3.- ■ 

3.: - 



to 



s ^ | 



i" r ; s. c e 



D i; 
C iC 
ILZS 

c.:c 

C 15 - 

ic 

C 3; 



Saliva 



:: 



I 



3 . 9 as. HULJffldii 



" li. - fr j :i ;. T "i 



a. to 

D.06 

3.:: 



Vaginal 



I 



- 2. 



5 S I 5 J 6 K ? 5 5 2 J ^ t p P Si 



c ■ c 

C 3? 

a.Da - 
a.Di - 

Q.D2 - 

c.ac 



Menstrual 



Skin 



I: 



K E S ? ft f f : ?' o m ? o v : k S £ " 



3.: 
M 

3.: 



„ ^^^^ ^.^^^^ . . I 

1 5 5 I g | g I !E 2 d |f S || £ 5 | § £ 3 



41 



Downloaded from http://biorxiv.org/on September 18, 2014 



Figure 3. 
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Figure 4. 
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Figure 5. 
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Supplementary Table 1. Sample Descriptions and Assay Input (Full Sample Set) 



oampie 


uescription 


extraction iyp6 


Input 
(1 ill 


Input 

fnal 

l n sJ 


-i 
i 


— r — TiTi — j — ; , 

50ul bloodstain on cotton cloth; donor 1 


otanuaru 


c . .1 

o |il 


bu ng 


7 


50ul bloodstain on cotton cloth; donor 2 


OtdnQdiU 


C 1 .1 


bu ng 


o 
o 


50ul bloodstain on cotton cloth; donor 3 


OtdnQdiQ 


K 1 il 

O |il 


bo ng 


4 


50|al bloodstain on cotton cloth; donor 4 


Standard 


5 |al 


50 ng 


r 

b 


Env. Bloodstain: outside, covered 3 day 


Standard 


5 ul 


M A 




^tionor oj 






D 


ou|ii DiooQstain on cotton ciotn, Qonor 4- 


L/1I cLL Liyblb ^t\l_i 1 J 


K 1 il 

0 |il 


NT A 


7 


OdL. jclllcll jWdU IL.ULLU11, LlllcLlJ, L1U11U1 J. 


OLdilLldl U. 


c; ul 




Q 
O 


OdL. jclllcll jWdU IL.ULLU11, Lll it. LI L L1U11U1 Z, 


OLdilLldl U 


K 1 il 

0 |11 




Q 


^^t" cpm ah c\A/:a}"\ f rnffnn HHpHV rlnnrir "3 

OdL. jtilltll jVVdU IL.ULLU11; Lll ItLl \ f LlUllUi O 


OLdilLldl LI 


^ nl 




10 


Sat. semen swab (cotton, dried); donor 4 


Standard 


Sul 


50 ng 


1 1 


Env: 50|ul semen on cotton cloth: 
outside, covcicii ± WccK [uonor oj 


Standard 


c ■ .1 


M A 
IN A 


1 ? 


I/, Cot* comon c\at"3V» Irnttnn HHpHV rlntinr 1 

/Z OdL. oClllCll jWdU IL.ULLUllj Lll 1CLI J f LIUlltJl J. 


L/ll CLL Ly jIj I I\l_i 1 1 


ni 


NA 

IN 


1 3 


Rnr*r*al cxArah frrit+on Hn'pHV rlnnnr 1 

L)LlL,L,dl JVVdU IL.ULLUl.lj Lll ItLl J f LlUiltJi J. 


OLdilLldl LI 


c; 1 ii 


^fl na 
3U n g 


1 4 


Riiffal c\A73Vi frr\ttr\Y\ Hn'pHV rlnnnr "? 

L)LlL,L,dl JVVdU IL.ULLLJ11, Lll ItLl J f LIUiltJi L, 


OLdilLldl LI 


^ nl 


^fl na 
3U n g 




DLlL.L.dl jWdU IL.ULLU11, Lll It-Ll J, LIUllUl O 


OLdilLldl LI 


^ nl 


^0 na 


ID 


tjuccai swau ^cotton, onenj, uonor ^ 


otanuaru 


Sul 


bu ng 


1 7 
1 / 


Env: 50ul saliva on cotton cloth: 
outside, covered 1 month (donor 5) 


Standard 


b |Lll 


t>0 ng 


18 


Yz buccal swab (cotton, dried); donor 6 


Direct Lysis (RLT) 


Sul 


NA 


19 


Yi Vaginal swab (cotton, dried); donor 1 


Standard 


5 ul 


m 

50 ng 


20 


Yi Vaginal swab (cotton, dried); donor 2 


Standard 


Sul 


50 ng 


21 


% Vaginal swab (cotton, dried); donor 3 


Standard 


5 ul 


50 ng 


22 


Yi Vaginal swab (cotton, dried); donor 4 


Standard 


Sul 


50 ng 


23 


Env: % vaginal swab: 


Standard 


5ul 


50 ng 




outside, uncovered 3 days (donor 5) 






24 


Yz Vaginal swab (cotton, dried); donor 2 


Direct Lysis (RLT) 


5|il 


NA 


25 


Y2 menstrual blood swab (cotton; dried) 
donor 1, day 2 of menstruation 


Standard 


5ul 


50 ng 


26 


Yz menstrual blood swab (cotton; dried) 
donor 2 


Standard 


5ul 


50 ng 


27 


Yz menstrual blood swab (cotton; dried) 
donor 3, day 1 of menstruation 


Standard 


5ul 


50 ng 


28 


Yz menstrual blood swab (cotton; dried) 
donor 4, day 2 of menstruation 


fj 1 1 

Standard 


5 |al 


50 ng 


29 


Y2 menstrual blood swab (cotton; dried) 
donor 5, Day 3 of menstruation 


Standard 


5ul 


50 ng 


30 


Y2 menstrual blood swab (cotton; dried) 
donor 1 


Direct Lysis (RLT) 


Sul 


NA 


31 


Skin - total RNA (commercial source) 


None 


5ul 


50 ng 


32 


Skin - total RNA (commercial source) 


None 


Sul 


50 ng 


33 


Skin - total RNA (commercial source) 


None 


Sul 


50 ng 
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34 

*J I 


*sVin — total R1\TA frommprrial ^onrrpl 

-JlYlll LL/Lcll Ivllfl 1 L \J 1 1 1 1 1 1 C 1 Llcll JUU1 LC 1 


None 


^ ill 


50 nu 


35 

•J .J 


Snrfarp Q\A/ab f\A/iiolp"i of rnmni ltpr mniKP 

Jl111c1L.C J WaU 1 VVIIUIC J kJL L VJ 1 1 1 IJ LI LC 1 IIIUUjC 


^IfanHarH 

JLclllLlcll LI 


^ 111 


NA 


3fi 


Snrfarp ^XA/ab fvA/holpl of rommitpr mniKP 

JllllclL-C J\vaLt 1 VVIIUIC J kJL L VJ 1 1 1 IJ LI LC 1 IIUJUjC 


Direct I vsis fRI Tl 


^ 111 


NA 


37 

•J / 


Qprnpri f Honor 71 — Hilntion Qprip^ 

JC111C11 1 LI KJ 1 1 \J 1 J LlllLlllL/11 oCl ICj 


StanHarH 

iJLClllLlCll LI 


^ nl 




38 


Semen (donor 2) - dilution series 


Standard 


5 |il 


12.5 ng 




C owi t~\ \~\ I r\ t~\ k\ r\ v~ si /"iilii1-i/~*v-\ caviar 

jcuicn ^uonor zj — dilution series 


OtdnQdi Q 


K i il 

D |il 


6.25 ng 




OdllVd ^LlUllUl ±J LlllllllUll bcllcii 


DLdllLldl LI 


K i il 

o |il 


25 ng 


41 


Saliva (donor 1) - dilution series 


Standard 


5ul 


12.5 ng 




Saliva (donor 1) - dilution series 


Standard 


5 LXl 


b.Zo ng 


43 


Human Brain - total RNA (commercial 
source) 


None 


5ul 


m 

50 ng 


A A 

44 


Extraction blank (blank/clean swab) 


C j J 1 

Standard 


5nl 


NA 


45 


100 bio-particles (55 clumps/45 singles); 
male shirt collar 


Direct Lysis (rbj 


5 ul 


M A 

NA 


46 


Vaginal (donor3) -semen (donor 1) mixture 


Standard 


5|il 


50 ng 




ft / n „ . . . ~ L- „ f 1 „i_ -\ 

(1/2 swab or each) 






47 


Blood (donor 1) -saliva (donor 2) mixture 


Standard 


5 ul 


50 ng 




(1/2 swab of each) 








ociTicn [uonor ij-sdiivd ^tionor zj-vdgindi 
f 1 f? Qw^h of parhl 

IX / jVvCIU L/l CclLll 1 


otdnQdiQ 


K i il 

D Ul 


i>u ng 


49 


V? ^Oiil blooH<;tain on rotton rloth" Honor f\ 

/Z >J L/ Lll U1UU Li J LCtlil Llil L.LI L LVJ 1 1 L 1 VJ LI 1 , LI Ll 11 Lll kJ 


Standard 

lJ LCll 1L1C11 LX 


1 0 ul 


60 ne 


50 


1/) ^Oiil blooHc:tain on rotton rloth" Honor f\ 

/ Z. ■JL/Lll Ul\J\J\A.J Icllll kJH ClILUJII ClLllll. LHJIUJI KJ 


Direct Lvsis f RLTI 

\—J X X V— L- L 1— J y J 1J IX VU X V 


5 ul 


NA 


51 


Technical rpnlirate* of #S0 

X vLlllllUUl 1 V_ IJ11 vCl LV_, KJ 1 f f KJ 


Direct Lvsis f RLTI 

I_V X X V_ L- L 1— J y J 1J IX VU X 1 


1 0 ul 

± KJ Lll 


NA 


52 


V-? ^Oiil blooH'stain on rotton rloth" Honor 1 

/Z JU Lll U1UU Li J LCtlil KJ 11 L.LJ I LLJ 1 1 L 1 L/ LI 1 , LlL/llL/l / 


Standard 

*j LCXX X LXCX X LX 


8 ul 

kJ LL1 


104 ne 


53 


V? ^Oiil blooH'stain on rotton rloth" Honor 1 

/Z JU Lll U1UU Li J LCtlil KJ 11 L U LLL* 11 L 1 L/ LI 1 , LlL/llL/l / 


Direct Lvsis f RLTI 

i—f X X V_ L> L 1— J y ij lij IX VXJ X J 


5 ul 

«J Lll 


NA 


54 


V? ^Oiil blooH<;tain on rotton rloth" Honor R 

/Z >J L/ Lll U1UU Jj LCtlil L/ll \,\J L LL/ 1 1 L. 1 L/ LI 1 , L1L/11L/1 (J 


Direct Lysis (RLT) 


5 ul 

«J LL1 


NA 


55 


i/> ^Oiil hlooH^tain on rotton rloth* Honor R 

/Z >JL/LL1 U1UU Jj LCtlil L/ll \,\J L LL/ 1 1 L 1 L/ LI 1 , L1L/11L/1 U 


Direct Lvsis f RLTI 

1 ' X X V_ L- L 1— J y J U IX VXJ X 1 


1 0 ul 


NA 


56 


Sat spmpn swdb frotton dried Y donor 6 

/ Z ^— ' U La O X 11V^ AX J VV CI I / 1 V— V / L L V / 11; V. 1 X X V_ V. 1 J j VX V / X X V / X V— » 


Standard 

\-J LCXX X LXCX X LX 


4 ul 

i LL1 


108 ng 


57 


Sat semen swab frotton driedV donor 6 

/ Z CI La O X ll\y XX J VV CI I / 1 \-> KJ L L V / 11; V. 1 X X V_ V. 1 J ; LX V/ X X V / X V— ' 


Direct Lvsis f RLTI 

\—J X X V— L- L 1— J y J i J IX VXJ X V 


5 ul 

J Lll 


NA 


58 


1/) Sat ?prnpti cwah frotton HripHY Honor 1 

/Z JCILa oCUlCll J VV CXU 1 LU L LL/ 1 1, LI 1 1 C LI J , L1L/11L/1 / 


StanHarH 

■JLClllLlCll LI 


K 3 nl 


101 ng 


59 


Sat (ipitipti ^wari frotton HripHY Honor 7 

/Z lJ C L L ■ OL111L11 O VV CIU 1 LUL LU 1 1 f vt 1 1L VX 1 , VX U11U1 / 


Direct Lvsis fRLTl 

LJ 11 LL L Ll V JlJ 1 1 KM— J I J 


5 ul 

J Lll 


NA 


60 


Technical renlicate of #59 

X LLlllllLUl X V_ yj 1 1 V_* C L L KJ L IT S 


Direct Lvsis fRLTl 

\—S X X L L L LJ y JlJ IX VXJ X 1 


1 0 ul 

-L W Lll 


NA 


61 


V? Sat semen swab fcotton driedV donor 8 

/ Z ' CX La *_> XXX XX J VV CX 1— / I w V/ U LV/ 11/ ^1 1 1 ^- ^1 I / C-l \J X X X u 


Direct Lvsis fRLTl 

\—S X X L L L LJ y JlJ IX VXJ X J 


5 ul 

•J Lll 


NA 


62 


Sat semen swab fcotton driedV donor 8 

/ Z ' CX La *_> XXX L^ XX J VV CX 1— / 1 w KJ L LL_/ X X ■ LX X X V_ LX 1 ■ LX L_/ X X KJ X L/ 


Direct Lvsis fRLTl 

\—S X X L L L LJ y JlJ IX VXJ X J 


1 0 ul 

J- VJ Lll 


NA 


65 


V? frp^h hnrral ^wah frottonY Honor R 

/Z 11 v O 1 1 l_/Ll\-.\-.CXl O VV CI VJ 1 \_ w L lull 1 ■ UUUU 1 LJ 


StanHarH 

l_J LCI 1 1 LIC1 1 LX 


1 0 ul 

J- VJ Lll 


470 ng 


66 


V? frp^h hnrral ^wah frottonY Honor R 

/Z 11 v O 1 1 l_/ LI L- CI 1 JVVUU 1 \_ w L LU X 1 I ■ LX U11U1 LJ 


Direct Lvsis fRLTl 

l—s 11 L L L 1—1 y JlJ \ 1 V J J I J 


5 ul 

•J Lll 


NA 


67 


Technical renlicate of #66 

X LLlllllLUl X LL/11LULL KJ 1 f 1 KJ KJ 


Direct Lvsis fRLTl 

LJ X X L L L LJ y JlJ IX VXJ X V 


1 0 ul 

J. VJ Lll 


NA 


68 


% fresh buccal swab (cotton); donor 9 


Direct Lysis (RLT) 


5 ul 

«J Lll 


NA 


69 


V? frp^h bnrral ^wab frottonY Honor 9 

/Z 11 LJ11 L-/ LX LLL11 J VV CIU 1 LU I K.KJ 11 1 ■ LX U11U1 _y 


Direct Lvsis fRLTl 

LJ 11 L L L Li y JlJ 1 1 VU I J 


1 0 ul 

J. VJ Lll 


NA 


70 


% fresh buccal swab fcottonV donor 9 

/ Z XX L Jll L_/ LX L> K-> CX X J VV CX KJ 1 L> L/ L LL/ XX 1 ■ LX L/ X X L/ X -s 


Direct Lvsis fRLTl 

\—J X X L L L LJ y J 1 J IX VXJ X J 


5 ul 

«J Lll 


NA 


71 


V2 fresh buccal swab (cotton); donor 9 


Direct Lysis (RLT) 


10 Lll 


NA 


72 


V2 vaginal swab (cotton; dried); donor 6 


Standard 


1 LXI 


332 ng 


73 


V2 vaginal swab (cotton; dried); donor 6 


Direct Lysis (RLT) 


5uJ 


NA 


74 


V2 vaginal swab (cotton; dried); donor 7 


Standard 


1 Lll 


255 ng 


75 


V2 vaginal swab (cotton; dried); donor 7 


Direct Lysis (RLT) 


5 Lll 


NA 


76 


V2 menstrual blood swab (cotton; dried); 


Standard 


lul 


118 ng 



donor 6, day 2 of menstruation 
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77 


Y2 menstrual blood swab (cotton; dried); 
donor 6, day 2 of menstruation 


Direct Lysis (RLT) 


5 Lll 


NA 


78 


% menstrual blood swab (cotton; dried); 
donor 7 


Standard 


3.6 Lil 
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0 Lll 
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lN/\ 
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Tprhniral rpnlirafp of #7Q 
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Swab of human ^kin fmalf 1 hand lpftl 

UVVUU 1 1 1 Li 1 1 1 C 1 1 1 O 1 V 111 1111C11\_ 11C111 W f 1LU ] 


Standard 


1 0 nl 


80 ne 
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Swab of human 9 kin fmalf* hand riphtl 

J VV Cl U KJ I J 1 Li 1 1 J CI 1 1 J) Ivlll 1 1 1 1 CL1 \_ 1 1CL 1 1 Lly 1 l£ll 1 L 1 


Direct I.vsis fRLTl 

U 11 H, L Ll y JlJ 1 1 V J 1 1 1 


^ ill 
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Swab of metal coffee cup surface (side 1) 


Standard 


8.3 Lil 
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O <J 


JvvaU Ul lllCLcll L,Ulltt L,ULJ jLII1cH_C lolLlC L> J 
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L/1I CLL LVjlj I Ivl_i 1 1 


^ nl 
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NA 
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25 bio-particles (clumps); male shirt collar 


Direct Lysis (RG) 


5 Lll 


NA 


Do 


ou uio-pdrticies ^ciumpsj, mdie snirtcoiidr 


uirect Lysis ^kuj 


K 1 il 
0 LL1 


M A 
IN A 
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Env: 50ul semen on cotton cloth: 
outside, covered 1 week (donor 9) 


Standard 


1.3 Lll 


100 ng 




50|il bloodstain on cotton cloth; donor 9 
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T 1 . .1 
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Vaginal (donor 4)-semen (donor 9) mixture 
(1/2 swab of each) 


Standard 


1.0 Lll 


164 ng 


Q 9 


Env: 50ul saliva on cotton cloth: 
outside, covered 1 week (donor 10) 


Standard 


7 7 ,,1 
/./ Lll 


1UU ng 
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V2 Sat. semen swab (cotton, dried); donor 10 


Standard 


4.3 Lil 


99 ng 


Q /I 


blood (donor 10)-saliva (donor 7) mixture 
yL/ l swdu or edcnj 


Standard 


Z.O Lll 


QQ y,™ 

Vo ng 
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Extraction blank (blank/clean swab) 


Standard 


5.0 Lll 


Ong 
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dried buccal swab (cotton); donor 1 


Standard 


1.0 Lll 


133 ng 
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Env: 50ul blood on cotton cloth: 
outside, uncovered 1 month (donor 11) 


Standard 


2.0 Lll 


106 ng 


98 


Skin - total RNA (commercial source) 


Standard 


2.0 Lil 


100 ng 



Env = environmental; direct lysis (FG) - forensicGEM™; direct Lysis (RG) = 
m4GEM™ 
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Supplementary Figure Legend 

Supplementary Figure 1. Housekeeping gene expression in training and test 
samples 

Supplementary Figure 2. Profiles of the training samples from each fluid are 
plotted against each other. 

Supplemental Figure 3. Boxplots for individual genes' proportion of total 
expression in the different sample types.BD = blood, SE =semen, SA = saliva, MB 
= menstrual blood, VS = vaginal secretions, SK = skin. 

Supplementary Figure 4. Performance of the algorithm in the training set. Bars 
display the rate at which each fluid is called detected in each sample type. Fluids are 
called detected if their likelihood ratio exceeds 100. 

Supplementary Figure 5. Performance of the algorithm in the test set. Bars 
display the rate at which each fluid is called detected in each sample type. Fluids are 
called detected if their likelihood ratio exceeds 100. 

Supplementary Figure 6. Concordance of the assay between purification and 
lysis protocols. For the 14 samples with replicates run under each protocol, the 
natural log gene expression profile under the lysis protocol (vertical axis) is plotted 
against the profile under the purification protocol (horizontal axis). 
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Supplementary Figure 1. 
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Vaginal training 
sample 1 
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Semen training 
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Supplemental Figure 3. 



HBB 



ALAS2 



ANK1 



PRM2 



TGM4 



1.0 - 
O.B - 


o 




si 

o 


o 
o 


0.6 - 










0 .4 - 










0.2 - 




o 






0.0 - 











04 
0.3 
02 
0 1 
0.0 



SEMG1 



0 015 
0.010 



[ZIC§]CS] 



STATH 



0.0B 
0.06 
004 
0.02 
000 



n 



0.06 
0.05 
004 
0.03 
0.02 
0.01 
0.00 



0.15 
0.10 
0.05 
0.00 



MUC7 



' .0 

0.* 

0.6 
0.4 
02 
0.0 



020 
0 IS 
0 10 
0 05 
000 













0.4 - 
0.3 - 




□ 


□ 
O 










0.2 - 




□ 


c 


O 








0.1 - 




0 

o 




a 








0 .0 - 






u 

:f> 


< 


> 


Qj 








u 



HTN3 



0.06 ■ 
0 04 
0.02 
0.00 ■ 



FUT6 

8 

I 



IL19 



NOX01 



MYOZ1 



DKK4 



CYP2B7P1 



0.10 
008 
0.06 
0.04 
0.02 
000 



KRT9 



0.025 ■ 
0.020 

0.015 ■ 

0.010 ■ 

0.005 ■ 

0.000 ■ 



q qj < at in 
m ui c/> > ^ 



0.010 
0.008 
0.006 
0.004 
0.002 
0 000 





L-l 

CO 


in < m m 

(fl (l) > s 

MMP10 






0.08 -, 




o 




0.04 -| 


0.06 - 








D.03 - 


0.04 - 








D.02 - 


0 02 - 




z> o 

8 ° 




0.01 - 


0.00 - 








0.00 - 




□ 


UJ < » CD 
ill B > S 











□ 






o 


o 




s 

„ 8 






n i 


Q u 


< 

'■/; 


V) 

> 


5 la 



LEFTY2 



LCE1C 



0 30 ■ 
0.25 
0.20 
0 15 ■ 
0 10 - 
D.05 
0 00 ■ 



0.05 
0.04 
0.03 
0.02 
001 
0.00 



0.08 
0.06 
0.04 
0.02 
0.00 



o 




0 25 -, 






n 








0.20 - 








:i 








0.15 - 






C 






□ 
















0 




0 10 - 








0 






^ n K 


0.05 - 






o 


o 


Ji. 






0 00 - 











IL1F7 



.0. 



CCL27 



0 14 
0. 12 
0.10 
008 
006 
004 
0.02 
0 00 



E. 



0.8 
0.6 
0.4 
02 
0 0 



0.12 ■ 

0.10 ■ 

0.08 

0.06 

0.04 

0.02 

0.00 ■ 



1 



LCE2D 



m w y> > 



56 



Downloaded from http://biorxiv.org/on September 18, 2014 
Supplementary Figure 4. 
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Supplementary Figure 5. 
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Supplementary Figure 6. 
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